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DETAILED ACTION 

The instant application having Application No. 10/765883 has a total of 7 claims 
pending in the application, there are 3 Independent claims and 4 dependent claims, all 
of which are ready for examination by the examiner. 

INFORMATION CONCERNING OATH/DECLARATION 

Oath/Declaration 

1 . Applicant's oath/declaration has been reviewed by Examiner and is found to 
confomi to the requirements prescribed in 37 CFR 1.63. 

STATUS OF CLAIM FOR PRIORITY IN THE APPLICATION 

2. As required by MPEP § 201 .14(c), acknowledgment is made of Applicant's claim 
for priority based on an application filed in the Japanese Patent Office on November 28, 
2003. 



INFORMATION CONCERNING DRAWINGS 

Drawings 

3. Applicant's drawings submitted are acceptable for examination purposes. 
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ACKNOWLEDGMENT OF REFERENCES CITED BY APPLICANT 

Information Disclosure Statement 

4. As required by MPEP § 609(c), Applicant's submission of the Infomnation 
Disclosure Statements dated January 29, 2004 and June 14, 2005 are acknowledged 
by Examiner and cited references have been considered in the examination of the 
claims now pending. As required by MPEP § 609 c(2), a copy of the PTOL-1449 
initialed and dated by Examiner is attached to the instant office action. 

OBJECTIONS 

Specification 

5. The title of the invention is not descriptive. A new title is required that is cleariy 
indicative of the invention to which the claims are directed. 

The following title is suggested: "System And Method For A Storage Control 
Apparatus Using Infomiation On Management Of Storage Resources". 

Claims 

6. Claims 1.3-4. and 6-7 are objected to because of the following 
informalities: 

7. As per claims 1. 4. and 7 . the phrase "data control I/O unit" in claims 1 and 4 
and the phrases "channel control unit" and "disk control unit" in claim 7 are inconsistent 
with the specification. Examiner believes that said phrases refer to "data control I/O 
section", "channel control section", and "disk control section" respectively from the 
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specification. Applicant must be consistent throughout both the specification and claims 
and choose either "section" or "unit" to describe the claimed inventions. 

8. As per claims 3 and 6 . the phrase "an RAID" in line 3 of claim 3 and line 4 of 
claim 6 should read "a RAID". 

9. Also for claim 3 . the phrase "include of a" in line 2 should read "include a". 

10. As per claim 4 . on line 1 1 there should be a semicolon after the word "request". 
Examiner also suggests that line 12 and 14 of claim 4 be indented to clearly signify new 
limitations within the claim. 

Appropriate corrections are required. 

REJECTIONS NOT BASED ON PRIOR ART 

Claim Rejections - 35 USC g 112 

1 1 . The following is a quotation of the second paragraph of 35 U.S.C. 1 12: 

The specification shall conclude with one or more claims particularly pointing out and distinctly 
claiming the subject matter which the applicant regards as his invention. 

12. Claims 1.4. and? are rejected under 35 U.S.C. 112, second paragraph, as 
being indefinite for failing to particularly point out and distinctly claim the subject 
matter which applicant regards as the invention. The phrase "can be 
communicatively connected" in line 3 of claims 1 , 4, and 7 does not clearly identify if the 
"plurality of communication ports" are actually communicatively connected to the 
"plurality of information processing apparatuses" or not. 
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REJECTIONS BASED ON PRIOR ART 

Claim Rejections - 35 USC g f 03 

1 3. The following is a quotation of 35 U.S.C. 103(a) which fomns the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

14. Claims 1-7 are rejected under 35 U.S.C. 103(a) as being obvious over 
Blumenau et al. (U.S. Patent 6,260,120) in view of Voigt et al. (U.S. Patent 
5,960,451). 

15. As per claim 1 . Blumenau discloses a storage control apparatus comprising: 
a data I/O control unit which has a plurality of communication ports that can be 

communicatively connected with any of a plurality of information processing 
apparatuses (col. 8. lines 24-28, 36-37, and 40-41; col. 9, lines 50-57; Fig. 1, elements 
20, 21, 22-25, 27, and 35-36; and Fig. 2, elements 41-44), is communicatively 
connected to a plurality of physical disk drives for storing data (col. 8, lines 24-35, 36- 
37. and 41-44; and Fig. 1, elements 20, 26, 28-31, and 37-38), receives a data I/O 
request for data stored in the physical disk drives from the infomriation processing 
apparatuses via the communication ports (col. 8, lines 48-49), and performs data 
read/write from/to the physical disk drives in accordance with the received data I/O 
request (col. 8, lines 56-60); It should be noted that "cached storage subsystem" is 
analogous to "storage control apparatus", "storage controller" is analogous to "data I/O 
control unit", and "hosts" are analogous to "information processing apparatuses". 
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a first memory storing a data which is read/written among the data stored in the 
physical disl< drives (col. 8, lines 36-37, 48-54. and 60-65; and Fig. 1, element 32); and 

a second memory storing information on management of storage resources 
including the communication ports and the physical disk drives (col. 27. lines 23-33 and 
Fig. 25. element 282); It should be noted that "virtual ports" are analogous to 
"communication ports". It should also be noted that the logical storage volumes directly 
correspond to the storage devices (i.e. physical disk drives), see col. 8. lines 28-29. 

wherein in response to reception of a transmission request of the information on 
management of the storage resource from a user via a user internee, an identifier of the 
communication port and an identifier of the physical disk drive are transmitted to said 
user interface (col. 30. line 59 - col. 31 , line 2 and Fig. 30, elements 346 and 347). It 
should be noted that "clicking on it with a pointing device" is analogous to "transmission 
request" and "system administrator" is analogous to "user". Also, see citation note 
directly above regarding logical storage volumes. 

Blumenau does not disclose expressly a second memory storing information on 
management of storage resources including a storage capacity of the first memory 
allocated for each user using the infonnation processing apparatuses; 

wherein in response to reception of a transmission request of the infomiation on 
management of the storage resource from a user via a user interface a storage capacity 
of the first memory which have been allocated for said user are transmitted to said user 
interface. 
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Voigt discloses a second memory storing infomnation on management of storage 
resources including a storage capacity of the first memory allocated for each user using 
the infomnation processing apparatuses (col. 6, lines 13-16; col. 7, lines 30-31; and Fig. 
2, element 56); 

wherein in response to reception of a transmission request of the information on 
management of the storage resource from a user via a user interface a storage capacity 
of the first memory which have been allocated for said user are transmitted to said user 
interface (col. 6. lines 13-17; col. 7, lines 5-9 and 30-36; Fig. 4, elements 90, 104, and 
106). It should be noted that "as the administrator moves the sliding bar" acts as a 
"transmission request". 

Blumenau and Voigt are analogous art because they are firom the same field of 
endeavor, that being storage systems that use logical storage units (LUNs) with 
graphical user interfaces (GUIs). 

At the time of the invention it would have been obvious to a person of ordinary 
skill in the art to combine Voigt's LUN cache storage capacity indicator and GUI with 
Blumenau's cached storage subsystem and GUI. 

The motivation for doing so would have been because in a system with fixed 
physical capacity, it would be beneficial to determine how much usable capacity can be 
afforded simultaneously for each logical unit type, given the diversity of consumption 
rates among the various types (Voigt, col. 3, lines 15-19). 

Therefore, it would have been obvious to combine Blumenau and Voigt for the 
benefit of obtaining the invention as specified in claim 1. 
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16. As per claims 2 and 5 , Biumenau discloses information on management of the 
storage resources includes: 

first correlation between the physical disk drive and a data amount which can 
be stored in the first memory among the data stored in the physical disk drive (col. 8, 
lines 56-62); It should be noted that when taking the broadest interpretation of the claim 
language it is clear that the limitations of the claim do not identify what "correlation" 
specifically entails or the size of the "data amount". Biumenau discloses reading data 
from the storage devices and writing the data (amount not specified, but nonetheless 
still a discrete amount of data) back to cache memory, thus disclosing a correlation 
between the storage devices and cache memory. 

and information representing a second correlation between the first correlation 
and the communication port (col. 8, lines 62-65). Again, it should be noted that when 
taking the broadest interpretation of the claim language it is clear that the limitations of 
the claim do not identify what "correlation" specifically entails. Biumenau discloses that 
the data used in the "first correlation" (see citation directly above) is written to the cache 
memory by the port adapters (which contain at least two ports, see col. 9, lines 54-55), 
thus disclosing a correlation between the first correlation and the communication port. 

17. As per claims 3 and 6 , Biumenau discloses physical disk drives include a 
plurality of hard disk drives constituting a RAID (col. 9, lines 16-19). 

18. As per claim 4 . Biumenau discloses a storage control apparatus comprising: a 
data I/O control unit which has a plurality of communication ports that can be 
communicatively connected with any of a plurality of information processing 
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apparatuses (col. 8, lines 24-28, 36-37, and 40-41; cx)l. 9, lines 50-57; Fig. 1, elements 
20, 21, 22-25, 27, and 35-36; and Fig. 2, elements 41-44), is communicatively 
connected to a plurality of physical disk drives for storing data (col. 8, lines 24-35, 36- 
37, and 41-44; and Fig. 1, elements 20, 26, 28-31, and 37-38), receives a data I/O 
request for data stored in the physical disk drives from the information processing 
apparatuses via the communication ports (col. 8, lines 48-49), and performs data 
read/write from/to the physical disk drives in accordance with the received data I/O 
request (col. 8, lines 56-60); It should be noted that "cached storage subsystem" is 
analogous to "storage control apparatus", "storage controller" is analogous to "data I/O 
control unit", and "hosts" are analogous to "information processing apparatuses". 

a first memory storing a data which is read/written among the data stored in the 
physical disk drives (col. 8, lines 36-37, 48-54, and 60-65; and Fig. 1, element 32); and 

a second memory storing information on management of storage resources 
including the communication ports and the physical disk drives (col. 27, lines 23-33 and 
Fig. 25, element 282); It should be noted that "virtual ports" are analogous to 
"communication ports". It should also be noted that the logical storage volumes directly 
correspond to the storage devices (i.e. physical disk drives), see col. 8, lines 28-29. 

said method comprising the steps of: 

receiving a transmission request of the information on management of the 
storage resource from a user via a user interface (col. 30, lines 59-62). It should be 
noted that "clicking on it with a pointing device" is analogous to "transmission request" 
and "system administrator" is analogous to "user". 
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and in response to said receiving step, transmitting an identifier of tlie 
communication port and an identifier of the physical disl^ drive (col. 30, line 62 - col. 31, 
line 2 and Fig. 30, elements 346 and 347). Also, see citation note directly above 
regarding logical storage volumes. 

Blumenau does not disclose expressly a second memory storing information on 
management of storage resources including a storage capacity of the first memory 
allocated for each user using the infonnation processing apparatuses; 

in response to said receiving step, transmitting a storage capacity of the first 
memory which have been allocated for said user to said user interface. 

Voigt discloses a second memory storing infonnation on management of storage 
resources including a storage capacity of the first memory allocated for each user using 
the information processing apparatuses (col. 6, lines 13-16; col. 7, lines 30-31; and Fig. 
2, element 56); 

in response to said receiving step, transmitting a storage capacity of the first 
memory which have been allocated for said user to said user interface (col. 6, lines 1 3- 
17; col. 7. lines 5-9 and 30-36; Fig. 4, elements 90, 104. and 106). It should be noted 
that "as the administrator moves the sliding bar" acts as a "transmission request". 
19. As per claim 7 . Blumenau discloses a storage control apparatus comprising: 

a channel control unit which has a plurality of communication ports that can be 
communicatively connected with any of a plurality of infomiation processing 
apparatuses and receives a data I/O request for data stored in physical disk drives 
including a plurality of hard disk drives constituting an RAID (col. 8. lines 24-28. 36-37. 
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40-41, and 48-49; col. 9. lines 16-19 and 50-57; Fig. 1, elements 20. 21, 22-25. 27. and 
35-36; and Fig. 2, elements 41-44); It should be noted that "cached storage subsystem" 
is analogous to "storage control apparatus", "port adapter" is analogous to "channel 
control unit", and "hosts" are analogous to "information processing apparatuses". 

a disk control unit which is communicatively connected to the physical disk drives 
and perfomns data read/write from/to the physical disk drives according to the data I/O 
request (col. 8, lines 24-35, 36-37, 41-44, 56-60; and Fig. 1. elements 20. 26. 28-31. 
and 37-38); It should be noted that "storage adapter" is analogous to "disk control unit". 

a first memory storing a data which is read/written among the data stored in the 
physical disk drives (col. 8, lines 36-37, 48-54. and 60-65; and Fig. 1, element 32); and 

a second memory storing infomnation on management of storage resources 
including the communication ports and the physical disk drives (col. 27, lines 23-33 and 
Fig. 25, element 282); It should be noted that "virtual ports" are analogous to 
"communication ports". It should also be noted that the logical storage volumes directly 
correspond to the storage devices (i.e. physical disk drives), see col. 8, lines 28-29. 

wherein in response to reception of a transmission request of the information on 
management of the storage resource from a user via a user interface, an identifier of the 
communication port and an identifier of the physical disk drive are transmitted to said 
user Interface (col. 30. line 59 - col. 31, line 2 and Fig. 30, elements 346 and 347). It 
should be noted that "clicking on it with a pointing device" is analogous to "transmission 
request" and "system administrator" is analogous to "user". Also, see citation note 
directly above regarding logical storage volumes. 



Application/Control Number: 10/765,883 Page 12 

Art Unit: 2185 

Blumenau does not disclose expressly a second memory storing infbmriation on 
management of storage resources including a storage capacity of the first memory 
allocated for each user using the information processing apparatuses; 

wherein in response to reception of a transmission request of the information on 
management of the storage resource from a user via a user Interface a storage capacity 
of the first memory which have been allocated for said user are transmitted to said user 
interface. 

Voigt discloses a second memory storing information on management of storage 
resources including a storage capacity of the first memory allocated for each user using 
the infonnatlon processing apparatuses (col. 6, lines 13-16; col. 7, lines 30-31; and Fig. 
2, element 56); 

wherein in response to reception of a transmission request of the 
information on management of the storage resource from a user via a user interface a 
storage capacity of the first memory which have been allocated for said user are 
transmitted to said user interface (col. 6, lines 13-17; col. 7, lines 5-9 and 30-36; Fig. 4, 
elements 90, 104, and 106). It should be noted that "as the administrator moves the 
sliding bar" acts as a "transmission request". 



RELEVANT ART CITED BY THE EXAMINER 
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The following prior art made of record and not relied upon is cited to establish the 
level of skill in Applicant's art and those arts considered reasonably pertinent to 
Applicant's disclosure. See MPEP 707.05(e). 

The following reference discloses a storage area network (SAN) comprised of 
a RAID array . 

U.S. Patent Application Publication Number 

2003/0093501 

Conclusion 
STATUS OF CLAIMS IN THE APPLICATION 

The following is a summary of the treatment and status of all claims in the 
application as recommended by MPEP 707.70(i): 

CLAIMS REJECTED IN THE APPLICATION 

Per the instant office action, claims 1-7 have received a first action on the merits 
and are subject of a first action non-final. 

Any Inquiry concerning this communication or earlier communications from the 
examiner should be directed to Arpan P. Savia whose telephone number is (571) 272- 
1077. The examiner can nomially be reached on M-F 8:30-5. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Donald Sparks can be reached on (571) 272-4201. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more infonnation about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). 





Arpan Savia 
Assistant Examiner 
11/17/05 
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Abstract 

Disk Arrays or RAlDs are a widely accepted I/O system 
architecture, useful for a wide range of applications. Be- 
fore a RAID can be used, one needs to configure the RAID 
to select parameters such as the RAID level to use, the stripe 
unit to use, how large of a cache to use, and so on. Selecting 
these configuration parameters can be quite complex, yet no 
aids are available today to help the user configure his RAID, 
The optimal selection of the parameters strongly depends on 
the specific workload characteristics of the application. In 
this paper, we describe a configuration tool called Raidtool 
which is intended to support the systems designer in the se- 
lection of the configuration parameters. Our approach con- 
sists of three basic steps. The first step is to collect a trace 
ofl/Os while running one or more typical applications. In 
the second step, this trace data is analyzed to determine the 
workload characteristics of the applications. In the third 
and final step, we use a simulator to evaluate the different 
RAID controller configurations. 



1. Introduction 

Disk An^ays or RAIDs have become a widely accepted 
I/O architecture. Even relatively cheap single card RAID 
controllers allow for a wide range of configuration param- 
eters such as RAID level, size of stripe unit, size of op- 
tional caches, etc., yet no aids are available today to help 
the user configure his RAID. The optimal parameter set- 
tings strongly depend on the characteristics of the applica- 
tions I/O-workload. In this paper we describe a software 
tool called Raidtool which supports the system designer in 
the process of configuring a RAID controller. 

Our main design goal was to come up with an easy to use 
and fast software tool which allows the efficient evaluation 
of different configuration aliemaiives. 

The first step in the configuration process is to collect 
workload information by means of I/O-traces. In a second 



step these traces are analyzed to identify the key workload 
characteristics of the application. Ihe third step is the sim- 
ulation of different configuration :iiltematives. These simu- 
lations are driven by a synthetic workload generator, which 
generates a workload according to the characteristics deter- 
mined in the second step. 

The rest of this paper describes our three step approach 
in some detail and is oi;ganized :is follows: in Section 2 
we discuss the features of a disk array controller modeled 
by Raidtool. Section 3 describes our overall approach; sec- 
tion 4 describes the components of the workload description 
provided by the analyzer to the simulator. In section 5 we 
discuss the simulator in some detail. Finally, in section 6 
we show some results from our tool compared to a fully 
trace-driven approach. 

2. RAID Architecture Model 

Raidtool accepts a description of the RAID controller 
card as its input. In order for out tool to support a wide 
range of different controller cards, we came up with a very 
generic and highly configurable cc»ntroller card description. 
A typical RAID controller card contains a microproces- 
sor. The card also contains a certain amount of DRAM 
for instructions and operands (Data Buffer in Figure 1). 
This DRAM is optimized for reading and writing small 
chunks of data (i.e., instructions and operands). In addi- 
tion there is typically another DRAM for data caching. A 
RAID controller card may optionally be equipped with non- 
volatile RAM (NVRAM); while the regular volatile cache 
is used for read caching only, this NVRAM is used for write 
caching. All these components of a controller card are con- 
nected by an internal bus with a certain bandwidth. The 
characteristics (speed, bandwidth, size, etc.) of the compo- 
nents as well as the bus are configurable in our tool. 

The RAID controller connects to a number of disks. For 
this controller-disk connection, Rsiidtool currently supports 
SCSI-2 [1] or SSA [2] connections. The performance char- 
acteristics of these connections are again configurable. 
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Figure 1. Principal Disk Array Architecture 

2.1. RAID Features 

In our model, a RAID controller card is configurable for 
a (unlimited) number of RAID groups. Each group may 
be configured for a different RAID level; we support all 
the usual RAID levels [5, 3]. Each group may consist of 
different disk types; within one group we assume the same 
disk type. Disks are configurable and also support track 
buffering. Examples of parameters supported for a RAID 
group are the RAID level, stripe unit, group size, etc. See 
our research report [7] for a complete list. 

22. NVRAM 

RAIDs with the optional NVRAM installed can do fast 
write [4]. The host is notified about the completion of a 
write request as soon as the data has arrived in the NVRAM 
cache. The physical write to disk (destage) is performed 
asynchronously at a later point in time. 

Our RAID model assumes that when a configuration 
with NVRAM is chosen, all writes are fast writes. Writes 
that arrive when the NVRAM is full are delayed until space 
becomes available. 

3. The Analyzer/Simulation Approach 
3.1. Overview 



load with comparable characteristics. Finally, in the third 
step, we simulate different disk array controller configura- 
tions. This simulation is driven by a synthetic load, which 
is generated according to the workload characteristics de- 
termined by the trace analyzer. The simulator is run several 
times, to simulate various I/O arrival rates, including rates 
faster than the actual I/O rate in the original trace. 




Figure 2. Analyzer/Simulation Approach 

There are two main reasons for taking this ana- 
lyzer/simulation approach instead of running the simulation 
directly from the I/O traces. First, the tool is intended for 
interactive use, so performance is a major concern. The an- 
alyzer needs to be run only once for a given configuration, 
whereas the simulator must be run several times with differ- 
ent arrival rates, so we improve performance by performing 
as much of the work as possible (e.g. cache modeling) in the 
analyzer. The second reason is that we were not satisfied 
that any of the known techniques for speeding up a given 
trace did an adequate job of preserving the dependencies of 
the original I/O requests. Speeding up a trace is required in 
order to perform simulation at I/O rates high^ than that of 
the original trace. 

Instead of the simulation, one could try to use an analyt- 
ical model to evaluate the configuration alternatives. Even 
though an analytical model allows for very quick evalua- 
tions, this approach is not practical because of the inherent 
complexity of the system being modeled. One could try to 
simplify the model to make it more tractable, but most of 
these simplifications are hard to justify or may even invali- 
date the analytical approach. 



Raidtool works in three main steps (Figure 2). The first 
step is to collect information about the typical workload in 
terms of I/O trace data. In the second step, this trace data is 
analyzed and a specific workload description is generated. 
This workload description is used to generate a synthetic 



3.2. The Analyzer 

The analyzer basically takes three items of information: 
a description of the traced system, a description of the RAID 
controller configuration, and the trace data itself. Based on 
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this information, the analyzer identifies the key workload 
characteristics which are important for the configuration of 
a RAID controller. 

33. The Simulator 

Using the workload description from the analyzer, the 
simulator generates a synthetic workload and models the 
selected RAID configuration. The synthetic workload con- 
sists of a number of generated I/O requests which are sub- 
mitted to the simulated RAID controller. These I/O requests 
are generated using exponential arrival times. The overall 
arrival rate must be specified by the user for each run of the 
simulator. The simulator uses the workload characteristics 
obtained from the output of the analyzer to generate the size 
and address of each I/O request. 

4. The Workload Description 

This section describes the elements of the workload de- 
scription provided by the analyzer to the simulator. 

4.1. Seelc Distribution 

Hie main workload characteristic provided by the ana- 
lyzer is a set of distributions of logical seek distances, which 
allows us to keep track of the spatial locality of the I/O- 
requests. The analyzer concatenates the address spaces of 
the traced disks to one large logical address space, which 
allows us to determine the logical block address of an I/O 
request. A logical seek is the distance between the logical 
block addresses of consecutive I/O-requests in the trace. 

The analyzer divides the logical address space into mul- 
tiple address ranges, and generates a separate logical seek 
disu-ibution for each address range, as shown in figure 3. 
This method was chosen because the alternatives of a sim- 
ple average seek distance or a single overall seek distribu- 
tion (independent of position) were found to be inadequate. 
We discuss this further in our research report [7). 




Figure 3. Multiple seek distributions, one per 
address range 



4.2. Request Size Distribution 

As with the logical seek information, providing only the 
average request size was deemed insufficient to capture the 
original distribution of request sizc^s. In addition, sizes for 
read and write requests may follow extremely different dis- 
tributions. Thus, the analyzer generates a request size distri- 
bution for read and write requests separately. For each pos- 
sible request size, the probability that a request has this size 
is given. From this request size/probability information the 
simulator calculates the Cumulative Distribution Function 
(CDF) of the request size distribution and generates pseudo 
random request sizes according to litis distribution. 

43. Interarrival Time Distribution 

The next important workload characteristic is the distri- 
bution of the interarrival time. The simplest (and most com- 
monly used) approach is to calculate the average interarrival 
time from the original trace and as;sume exponential inter- 
arrivals. Unfortunately, we obsen^ed that this assumption 
is not generally Uiie. Figure 4 shows the arrival rate of re- 
quests as a function of evolving time for a real I/O trace. 
The overall average arrival rate for this trace is about 5.2 
[req/s]. During most of the time the trace has an arrival rate 
close to 1 [req/s]; only during two peak periods is the arrival 
rate significantly higher. The overall observed maximum 
arrival rate is larger than 400 [req/s]. It is quite inappropri- 
ate to describe this behavior with an exponential interarrival 
time distribution whose mean is 0.19 [s]^ 
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Figure 4. Anrival rate vs. evolving time for an 
I/O trace 



The bottom line is that the assumption of exponential 
interarrival times is not always impropriate because some 
traces show a higher degree of "burstiness" than we can 

model with an exponential distribution. 



'the interanival time of 0.19 [s] corresponds to an arrival rate of S.2 
(rcq/sl. 
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4.4. Analyzing the Peak Periods of a IVace 

Even though the assumption of exponential arrivals may 
not be appropriate for an entire trace, we may find time in- 
tervals in the trace which are better behaved in terms of their 
interarrival time distribution. In addition, the customer is 
only interested in the high load phases of a trace, since low 
load periods do not cause any performance problems. We 
therefore developed a simple filter, which takes a trace as 
input and separates the trace periods with high load. 

bo7fp.l.n / wg. AR=3B8.76 Iw^f 
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Figure 5. Arrival rate vs. evolving time for the 
first peak period of the AFS-trace 

Figure 5 shows the average arrival rate as a function of 
evolving time for the first peak identified by the above de- 
scribed peak determination algorithm. The figure shows 
that we still observe a certain fluctuation in the average ar- 
rival rate, but the trace is much more well behaved with 
respect to the arrival rate, so it is reasonable to use an expo- 
nential distribution for the interarrival times. To account for 
minor variations in the arrival rate over time, the analyzer 
provides the simulator with a different arrival rate for each 
interval (where an interval is some fixed number of I/Os.) 

4 J. Cache Hit Ratios 

The analyzer generates hit ratios for the different RAM 
and NVRAM caches. We distinguish three types of cache 
hits: 

• Read hit: A read request finds the requested data in the 
regular RAM cache or in the NVRAM cache. 

• Write hit on clean: A write request finds a clean copy 
(value same as on disk) of the data to be updated in the 
RAM cache. This may save a read operation during 
later dcstage of the data from cache. 

• Write hit on dirty: A write request finds the data to be 
updated in NVRAM cache. This old version is over- 
written. This saves the destaging of the old version. 

In addition, the analyzer generates hit ratios for the disk 
buffers. For each disk we get a separate buffer hit ratio. 
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Figure 6. A trace containing interleaved 
thread l/Os 



4.6* Destaging 

For configurations with NVRAM, all writes are sub- 
mitted to the NVRAM and, at some later point in time, 
destaged to disk. Since destaging is part of the cache logic, 
it is modeled in the analyzer, which supplies the simulator 
with the distribution of the size and physical seek distance 
of destage I/O operations. Each destaging operation is a 
read or a write to one disk. Given the destage I/O infor- 
mation from the analyzer, the simulator adds destaging load 
to the disks. See [7] for a full description of the algorithm 
used. 

4.7. Thread Lists 

The logical seek distribution approach described in Sec- 
tion 4. 1 produces fairiy accurate results for most workloads. 
However, there are certain workloads which are not de- 
scribed well by this approach. These are workloads which 
consist of multiple series or "threads" of interleaved sequen- 
tial I/Os. An example is shown in Figure 6. 

The difficulty with sequential threads arises due to the 
fact that a pair of successive I/Os in a thread are separated 
by unrelated I/Os of another thread. This causes large logi- 
cal seeks between the two I/Os of the thread. At such high 
seek values, there is not enough granularity in the seek dis- 
tributions to enable the second I/O of the pair to return ex- 
actly to the point where the first left off. 

To solve this problem, we decided to use a heuristic to 
explicitly identify threads of sequential I/Os. This heuristic 
identifies two I/Os as part of the same thread if the second's 
logical block address immediately follows the first's (i.e., 
the second I/O's LBA is equal to the first I/O's LBA plus 
the first yO*s request size in blocks.) There must be no 
more than a small number of I/Os intervening between the 
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two I/Os of the thread. Using this technique, the analyzer 
outputs multiple thread lists; each list describes the threads 
active during each interval of some number of I/Os. Each 
thread in the list is described by the number of I/Os in the 
thread, the starting block number* the total size of all the 
I/Os in the thread, etc. For each I/O it generates, the simu- 
lator determines whether the I/O is a thread I/O, and which 
thread it is a part of, based on the total number of thread 
I/Os that occurred during the interval. 

In general, each sequential thread I/O in the original 
trace is dependent on its predecessor, and does not start un- 
til its predecessor has completed. To capture this behavior, 
the simulator keeps track of which threads are still active, 
i.e. have an I/O outstanding which has not yet completed. 
Only a thread that is not already active may be chosen for 
a newly-generated I/O to be a part of. If all threads are al- 
ready active, the I/O is delayed until one or more threads 
become available. 

The simulator simulates a given configuration at a vari- 
ety of arrival rates. Increasing the arrival rate compared to 
the original trace will also change the distribution of thread 
I/Os. Thus, it is necessary for the simulator to modify the 
list of thread I/Os provided by the analyzer when a differ- 
ent arrival rale is used. To increase the arrival rale by x, 
we merge x adjacent lists of threads into one. This makes 
more threads available for the simulator to choose from at 
any given instant of time, similar to the effect that would 
occur from speeding up the trace. 

5. RAID Controller Simulation 

5.1. Cache Configurations 

The simulator does no cache simulation, but determines 
randomly whether a given I/O request is a cache hit, with 
probability equal to the appropriate hit ratio provided by 
the analyzer. The simulator supports 4 basic cache config- 
urations: no cache, read cache only, write cache only, and 
read/write cache. 

5.2. I/O Scheduling 

An I/O request generated by the simulator is converted to 
the appropriate disk-level I/Os based on the RAID config- 
uration chosen for simulation. The following sections de- 
scribe the timing of these disk-level I/Os. 

5.2.1. Timing of Read I/Os 

Figure 7 illustrates the timing of a read miss, as it is real- 
ized in the simulation program. Components of the read I/O 
response lime are: tcpu,ReadM%$»* which denotes the CPU 
overhead for a read miss, and tDisk* which denotes the disk 



service lime. The disk service lime may be further broken 
up as shown, into the queueing delay, seek time, head settle 
time, rotational latency, transfer time, and controller over- 
head. The figure shows a read request which is serviced by 
five disks. All five I/O operations are issued at the same 
time and the "slowest" of these five disks determines the 
response time of the entire read request. 



Figure 7. Components of read I/O response 
time tr«ip (read miss) 



5.2.2. Timing of Write I/Os 

With NVRAM we assume that Kich write I/O is a fast write. 
The response time in this case h given by the time the I/O 
spent in the NVRAM queue waiting for sufficient NVRAM 
to become available plus CPU overhead for a fast write. 
Without NVRAM, each write is converted into four oper- 
ations (in RAID 5): a read and a write operation at the data 
disk and a read and a write operation at the data disk [3]. 

In the simulation, we provide for the slection of one 
of three different scheduling strategies. Tliese are: Al- 
waysSteal, in which we allow the scheduling of other re- 
quests, i.e. the "stealing" of the disk arm, between the read 
and write operations; StealOnPsrity, in which we only al- 
low stealing of the disk arm between the read and write of 
parity; and NeverSteal, in which we disallow stealing the 
disk arm between the read and write operations. See (7) for 
more information. 



6. Results 

To verify the validity and accuracy of our ana- 
lyzer/simulation approach, we conducted a brief compari- 
son study between our results and those of a traditional fully 
trace-driven simulation. For the role of a trace-driven sim- 
ulation, we used the trace analyzer. Since it is a component 
of Raidtool, the RAID model it uses is identical to that of 
the full analyzer/simulation approach. It is able to produce 
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Table 1. Traces for simulation 
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Table 2. Configurations used 



response time values and performs full cache and NVRAM 
modeling. 

We used four main traces for testing our tool, as shown in 
Table 1 . Trace A was collected from a networked file server 
during a period of peak activity. This trace contained a very 
large number of sequential threads. Trace B was collected 
from a DB2 database server running the TPC-C benchmark. 
Trace C was generated using a synthetic trace generation 
program. Trace D was a trace of on-line transaction pro- 
cessing activity collected from a database server. 

The various configurations we tested are shown in Ta- 
ble 2. Average response time results for both our tool and 
the trace-driven simulator are shown in Table 3. These re- 
suls are at the original I/O rate of the trace. 

These results show that our tool can approach quite 
closely to the trace-driven results at the original trace's I/O 
rate. This may be enough for some purposes. However, 
our analyzer already provides a value for the response time 
results at the original I/O rate, so the main purpose of our 
analyzer/simulation approach is to provide quick results at 
higher arrival rates. 

Thus« we increased the arrival rate of the traces in order 
to compare the sped-up results to those of our tool for some 
of the configurations. To do this, we used the simple method 
of decreasing the interarrival time of each I/O by the factor 
that the trace is being sped up. This method was preferred 
over more complex methods such as folding [6] because the 
latter will greatly change some of the workload character- 
istics determined by the analyzer, such as the logical seek 
distributions and cache hit ratios. 
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Table 3. Simulation results (response times in 
milliseconds) 
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Figure 8. Simulation results for Trc A, Conf 1 



Figure 8 shows results for Trace A at configuration 1. 
These results show that the two methods are quite close at 
arrival rales close to the trace, then begin to diverge at in- 
creased arrival rates. Figure 9 shows results for TVace A at 
configuration 3. Despite there being few writes in this trace, 
Conf. 3 (RAID-0) shows a significant advantage over Conf. 
1 (RAID-5). 

Figure 1 0 shows the results for Trace C at configurations 
4 and 7. Note that although the tool does not match the 
trace-driven results exactly, it shows sufficiently the relative 
difference between these two configurations. It indicates 
clearly that the 16 MB of cache improves performance sig- 
nificantly. 

Figure 1 1 shows the results for Trace D at configurations 
1,8, and 9. Here the curves are quite close and show the 
significant advantage gained by adding 8 MB of NVRAM 
(Conf 8) over none at all (Conf 1). However, increasing 
NVRAM to 16 MB (Conf 9) gains little improvement. 
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Figure 9. Simulation resuKs for Ttc A, Conf 3 



Figure 11. Simulation results for Thice D 
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Hgure 10. Simulation results for IVace C 

?• Conclusion 

Raidtool has been in use for approximately one year at 
IBM in a variety of situations. It consists of about 8S00 
lines of C-H- code, and runs quickly enough to meet our 
initial goal for a fast, easy-to-use tool. It produces quite 
accurate approximations of results at arrival rates close to 
that of the original trace, which become somewhat more 
inaccurate as the arrival rate is increased. Nevertheless, the 
results appear to be accurate enough to allow a user to select 
the most appropriate RAID configuration. 
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